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Sutnmary 

In  the  present  paper  we  show  that  the  method 
of  "approximation  In  policy  space, developed  In  the 
theory  of  dynamic  programming,  yields  monotone  conver¬ 
gence  In  the  calculus  of  variations. 


MONOTONE  CONVERGENCE  IN  DYNAMIC  PROGRAMMING 
AND  THE  CALCULUS  OP  VARIATIONS 

Richard  Bellman 


51 •  Introduction 

In  QJ  we  outlined  some  applications  of  the  functional 
equation  approach  of  the  theory  of  dynamic  programming  to  the 
characterization  of  extremal  curves  and  eigenvalues  In  the 
calculus  of  variations.  A  more  detailed  account  of  this  new 
formalism  will  be  found  In  [2]  and  [^J . 

The  purpose  of  the  present  note  Is  to  show  that  another 
Important  concept  In  the  theory  of  dynamic  programming,  that  of 
"approximation  In  policy  space,"  may  also  be  utilized  to  yield 
some  Interesting  results  In  the  calculus  of  variations.  As  we 
shall  show  below,  this  Idea  leads  to  the  solution  of  variational 
problems  by  Iterative  techniques  which  yield  monotone  approxi¬ 
mation,  and  Indeed  monotone  convergence,  as  we  shall  show 
elsewhere. 

To  Illustrate  this  new  concept,  we  shall  consider  first  a 
discrete  dynamic  programming  problem,  and  then  present  the  ana¬ 
logous  treatment  of  a  continuous  version,  namely  the  maximization 

of  J(y)  -  ,/p(x,y ,t)dt,  subject  to  x  -  G(x,y,t),  x(0)  -  c. 
o 

Following  this  we  shall  discuss  the  application  of  this  technique 
to  the  eigenvalue  problem  associated  with  the  equation 
u  +  Pi®i(t)’j  -  0,  u(0)  -  u(l)  -  0. 
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Plnally,  we  shall  sketch  briefly  a  method  th£;t  may  be 
used  to  demonstrate  convergence  of  the  Iterative  procedure. 

^2.  Monotone  Convergence  In  Dynamic  Programming 
Let  ur,  coni  Ider  the  functional  equation 


f(x) 


Max 


fc(y) 


0<y<x 


h(x-y)  -►  f(ay  b(x-y)) 


T(f)  (2.1) 


where  0  <  a,  b  <  1,  which  arises  In  connection  with  various  types 
of  multi-stage  allocation  processes.  It  Is  readily  shown  that 
If  g{x)  and  h(x)  are  continuous  In  x  over  an  Interval  [O,^, 
with  g(0)  ■  h(0)  •  0,  then,  starting  with  any  Initial  function 
f^lx)  which  Is  continuous  over  [p»£]  and  zero  at  x  -  0,  the 
Iterative  procedure  yields  the  unique  solution 

of  (2.1)  which  Is  continuous  at  x  -  0. 

To  obtain  monotone  convergence,  we  approximate  first  In 
policy  space.  A  policy  Is,  with  reference  to  (2.1),  a  choice 
of  y  •  y(x)  with  0  ^  y(x)  ^  x.  Let  y^  "  Initial 

policy  and  let  fQ(x)  be  computed  by  recurrence  from  the  functional 
equation 


fo(x)  «  K(yo)  '*■ 


It  Is  Immediately  leir  that  fi(x)  as  determined  by 
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fi(x)  -  T(fQ) 


(2.3) 


Is  greater  than  or  equal  to  Pq(x)  for  x  ^  0,  fi(x)  ^  fQ(x). 

From  thl8  It  follows  Inductively  that  .  T(f^)  Is  greater 

than  or  equal  to  for  all  x  ^  0.  Hence,  we  have  monotone 

convergence  < 


^3.  Monotone  Approximation  in  the  Maximization  of  ^  F(x,y,t)dt 
As  In  [_lj  and  [3]  ,  we  write 


a.fT 

f(a,c,T)  ■  Max  S' 

y 


P(x,y ,t )dt 


(3.1) 


where  dx/dt  •  G(x,y,t),  x(a)  ■  c.  The  function  f  satisfies 
the  functional  equation 


f,j.  -  Max  j^P(c,v,a)  +  G(c,v,a)f^  +  f^  (3-2) 


As  in  our  previous  notes,  we  shall  avoid  all  discussion  of 
necessary  or  sufficient  conditions  for  these  equations  and  pre¬ 
sent  only  the  basic  formalism. 

An  approximation  In  policy  space  Is  now  a  choice  of  y  as 
a  function  of  x,  a,  and  T,  which  Is  to  say  v  =  y(ii)  as  a  function 
of  c,  a,  and  T.  Let  v^  represent  an  Initial  choice,  and  let 

fo(a,c,T)  denote  tne  function  obtained  In  this  way.  Then 
a+T 

-  J  P(xQ,y^^,t)dt,  Xq  -  G(xQ,y^,t),  XQ(a)  »  c, 

cl 

and  f^  satisfies  the  partial  differential  equation 
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“■oT  ■  •'(‘=>''0'®)  "  ‘‘oa 

fo(»>=-0)  =  0 

A  further  approximation,  Vi,  to  an  optimal  policy  Is  riow 
determined  by  the  condition  that  Vi  maximize  the  function  of  v 
given  by 


H(v,f^)  -  F(c,v,a)  G(c,v,a)r^^  +  f^  (3.^) 

Let  f|  be  the  function  determined  by  Vi,  satisfying  tne  equa¬ 
tion  f^^  «  H(vi,f|).  Similarly  we  determine  Vg  by  the  condition 
that  It  maximize  H(v,fi),  and  so  on,  obtaining  In  this  way  two 
sequences  of  functions 

Let  us  now  demonstrate  the  essential  result  that  the 
sequence  ^f^'l  Is  monotone  Intreaslng  In  n  for  ail  a,  c.  and 
T  ^  0.  We  have 


‘’it  ~  '’OT  "  “(''•ah  - 

-  H(v, ,f^)  -  +  H(v,,f,) 


(J-Si) 

H(v,  ,f|j) 


or 


(f.-fo)T 


A(c,T,a)  +  (fi-i'o)c  B(c,T,a)  ^  (^1-^0)^ 


(3.6) 
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where,  from  the  manner  In  which  Vi  was  determined,  A  >  0. 

Prom  this,  and  the  boundary  condition  at  T  ■  0  for  f^  and 
fi,  it  follows  readily  that  -  Pq  ^  ®  T  ^  0. 

The  same  argument  shows  that  f  ,  >  f  . 

^4.  Sketch  of  a  Convergence  Proof 

The  above  argument  yields  monotone  approximation  with 

discussing  convergence.  One  approach  to  a  proof  of  convergence 

is  to  consider  the  corresponding  discrete  problem  of  maximizing 
N 

^  subject  to  -  Xj  .  G(Xj^,y^,k), 

•  c ,  which  yields  tne  functional  equation  for  f(a,c,N) 

f(a,c,N+l)  -  Max  ^P(c,y,a)  f  (a+1  ,c+G(c  ,y  ,a )  ,N )  J  (4.1) 


and  use  a  limiting  process. 

§5*  Monotone  Convergence  in  Eigenvalue  Problems 

The  problem  of  determining  the  values  of  which  pemit 
nontrivial  solution  of 


u"  +>.*^(t)u  -  0,  u(0)  -  u(l)  -  0 


(5.1) 


is,  under  sii^t  restrictions  on  ^(t),  ec^uivalent  to  the  problem 

of  determining  the  relative  maxfona  of  ,f  2^.  .  ,  ^  ^ 

0  ^►(t)u*dt  subject  to  the 
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1 

con8tralnt3  •  1,  u(0)  •  u(l)  •  0.  To  attack  this 

o 

problem  by  the  functional  equation  method  outlined  above,  we 
consider  the  more  general  problem  of  determining  the  maximum 
of 


1  1 

J(u)  •  S  ^(t)u*dt  k  j'  (l-t)u^(t)dt  (5«2) 

a  a 


Setting  Max  J(u)  -  f(a,k),  we  obtain  (see  [  ],  Q J  )  the  equa 
u 

tlon 


f^  -  Min  |^(f^f^/2)w*  -w  D2+k<l(a)/l^)3  J  (3.3) 

where 

1 

(1(a)  -  (5.4) 

a 

and  w  -  u'(a).  A  choice  of  a  policy  Is  a  choice  of  w  -  w(a,k). 
The  method  of  successive  approxlma t * cns  used  above  may  again 
be  employed  and  the  ^roof  of  the  monotonlclty  Is  essentially 
as  before. 
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